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1 

A method and a device for source coding 

FIELD OF THE INVENTION 

5 

The present invention relates generally to source coding of data. In particular the 
invention concerns predictive speech coding methods that represent speech signal via a 
speech synthesis filter and an excitation signal thereof. 

10 BACKGROUND OF THE INVENTION 

Modem wireless communication systems such as GSM (Global System for mobile 
communications) and UMTS (Universal Mobile Teleconramnications System) transfer 
various types of data over the air interface between the network elements such as a 

15 base station and a mobile terminaL As the general demand for transfer capacity 
continuously rises due to e.g. new multimedia services coming available, new more 
efficient techniques have to be developed respectively for data compression as radio 
frequencies can nowadays be considered as scarce resources. Data compression is 
traditionally also used for reducing storage space requirements in computer data 

20 systems, for example. Likewise, different methods for picture, video, music and speech, 
coding have been developed during the last few decades. 

Data is usually compressed (-compacted) by utilizing a so-called encoder to be 
subsequently regenerated Avith a decoder for later exploitation whenever needed. Data 

25 coding techniques may be classified according to a number of different approaches. 
One is based on the coding result the (en)coder produces; a lossless encoder compacts 
the source data but any information is actually not lost during the encoding process, i.e. 
after decoding the data matches perfectly with the un-encoded data, meanwhile a lossy^ 
coder produces a compacted presentation of the source data the decoding result of 

30 which does not completely correspond to the original pres^tation anymore. However, 
a data loss is not a problem in situations wherein the user of the data cannot either 
uistmguish^the differences between die original and once c<ompacted data, or the 
differences do not, at least, cause severe difficulties or objection in exploiting slightly^ 
degraded data. As human senses including hearing and vision are somewhat limited^ 

35 it's, for example, possible to extract unnecessary details from pictures, video or audio 
signals without considerably disturbing the final sensa t io n effect. Often source coders 
produce fixednrate output meaning the compaction_ratio does not de nenfi nn th e input 
data. Altematively, a variable-rate coder takes statistics of the input signal into account 
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while analysing it thus outputting compacted data with variable rate. Variable-rate 
coding surely has certain benefits over fixed-rate models. Considering e.g. the field of 
speech coding a variable-rate codec (coder-decoder) can maximise the capacity and 
minimize the average bit-rate for given speech quality. This originates firom the non- 
5 stationarity (or quasi-stationarity) of a typical human speech signal; a single speech 
segment, as the coders process a certain period of speech at a time, may comprise 
either very homogenous signal (e.g. periodically repetitive voiced sound) or strongly 
fluctuating signal (transitions etc) thus directly affecting flie miniTmim amount of bits 
required for sufficient representation of the segment under analysis. In addition, 

10 considering especially mobile networks achieved savings in source coding may be used 
for enhancing e.g. channel coding thus resulting a better tolerance against interference 
on the radio path. Fixed-rate coders always need to operate at a compromise rate that is 
low enough to save transmission capacity but high enough to code difficult segment 
wit adequate quality, the compromise rate obviously being unnecessary high for 

1 5 "easier*' speech segments. 

Still, as Ihe nature and targeted use of the source data defines on case-by-case basis the 
optimum means for compacting it, an idea of a generic optimum coder directly 
applicable for any possible scenario is utopistic; development of soiirce coding has 
20 been diverged into many directions utilizing the data statistics and imperfections of 
human senses into maximum accoimt in a specialized manner. 

In case of mobile networks a speech coder is definitely one of the most cmcial 
elements in providing the caller/callee a satisfactory call e^qperience in addition to 

25 various voice storage and voice message services. Modem speech coders have a 
common starting point: compact representation of digitised speech while preserving 
speech quality, truly a subjective measure concerning e.g. speech intelligibility and 
naturalness although sometimes also "objectively" measured by utiUzing weighted 
distortion measures, but the techniques used in modeling greatly vary. One speech- 

30 coding model heavily utilized today is called CELP (Code Excited Linear Prediction). 
CELP coders like GSM EFR (Enhanced Full Rate), UMTS adaptive multi-rate coder 
AMR and TETRA ACELP (Algrebraic Code Excited Linear Prediction) belong to the 
group of AbS (Analysis by Synthesis) coders and produce the speech parameters by 
modeling the speech signal via minimizing an error between the original and 

35 synthesizBd speech in a loop. CELP coders cany features firom both waveform 
(common PCM etc) and vocoder techniques. 
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Vocoders are parametric coders that e^qploit, for example, a source-filter approach in 
speech parameterisation. The soxirce models the signal originated by air-flow emitting 
from the lungs to glottis either throu^ vibrating (resulting voiced sounds) or stiff 
(resulting unvoiced sounds with turbulence originated from different shapes within the 
5 vocal tract) vocal cords up to the oral cavities (mouth, throat) to be finally radiated out 
through the lips. 



Figure 1 discloses a generic sketch of a simplified human speech production model, 
called an LP (Linear Predictive) model that is utilized in many contemporary speech 

10 coding methods like CELP. The process is called linear prediction since current output 
S(n) is determined by a weighted sum of previous output values and an input value 
generated by pulse source 102 or noise source 104 depending on the nature of speech, 
roughly being divided to either voiced in the first and unvoiced in the latter case. Pulse 
source 102 emitting the impulse train imitates the vibration at the glottis with a 

15 corresponding fimdamental frequency called a pitch frequency with a certain pitch 
period. Source type may be altered during the synthesis process via switch 106. Before 
filtering the excitation source signal with all-pole UR (Infinite Impulse Response) filter 
110 modeling the vocal tract it is multiplied by a proper gain factor in multiplier 108. 
Therefore, speech synthesis can be performed by first defining the class of cxirrent 

20 speech segment xmder consideration as either voiced or unvoiced, and then by driving 
the excitation signal of tiie selected type through a multiplier and a synthesis filter. 
More about LP and speech modeling or coding in general can be found in reference 

[1]. 

25 A typical CELP coder, presented in figure 2, and a corresponding decoder, presented in 
figure 3, comprises several filters for modeling speech generation, namely at least a 
short-term filter such as an LP(C) synthesis filter used for modeling the spectral 
envelope (formants; resonances introduced by vocal tract) and a long-term filter the 
-purpo&e-of-which-is to model the oscillation of the vocal cords inducing periodicity in 

30 -the-vGieed excitatiGn signal comprising impulses separated by the current pitch period 
csSed a lag. The modeling is substantially targeted to a single speech segment, called a 
frame hereinafter, at a time. As can be noticed from figure 3, the decoder structure 
reminds of the common LP synthesis model wilii an additional LTP (Long=Term 
Prediction) filter. The excitation signal is created on the basis of an excitation vector 

55 for tiie respective block. For example, in "ACEtP~coxiers the excitation consists of-a 
fixedim mber of non-zero pul ses tiie- position and amplitude of which is selected by 
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Utilizing a search in which a perceptually weighted error term between the original and 
synthesized speech frame is minimized. 

Considering CELP encoding and decoding in more detail a preview of codec intemals 
5 is presented herein. The encoder includes short-term analysis function 204 to form a 
set of direct form filter coefficients called LP parameters a(i), where i=l,2,...,m (m 
thus defining the order of the analysis), for example. Parameters a(i) are calculated 
once for a speech frame of N samples, N corresponding e.g. a time period of 20 
milliseconds. As speech has a quasi-stationary nature meaning it may be considered as 

10 stationary if the inspection period is short enough (<=20ms), optimum filter 
coefficients can be calculated for a single frame by utilizing standard mathematic 
means such as Wiener filter theory, which requires signal stationarity, on frame-by- 
frame basis. Resulting equation with computationally exhaustive matrix inversion may 
then be effectively calculated by exploiting e.g. so-called autocorrelation method and 

15 Levinson-Durbin recursion. See reference [2] for further information. LP parameters 
a(i) are exploited in searching the lag value matching best with the speech frame xmder 
analysis, in calculating a so-called LP residual by filtering the speech with LPC 
analysis (or "inverse") filter, being the inverse A(z) of LPC synthesis filter 1/A(z), and 
naturally as coefficients of LPC synthesis filter 210 while creating a synthesized 

20 speech signal ss(n). The lag value is calculated in LTP analysis block 202 and used by 
LTP synthesis filter 208. The long-term predictor and corresponding synfliesis filter 
208 being the inversion thereof is typically like an LP predictor with a single tap only. 
The tap may optionally have a gain factor g2 of its own (thus defining the total gain of 
the one tap LTP filter). LP parameters are also utilized in the excitation codebook 

25 search as described below. 

In a basic CELP coder, after definition of proper lag value T and LP parameters a(i), 
iteration for a perfect excitation codebook vector according to the selected error 
criteria is started. In some advanced coding models it's possible to fine-tune the lag 

30 value or even LP paramet^s while searcliing a perfect excitation vector. During an 
iteration round, excitation vector c(n) is selected from codebook 206, filtered through 
LTP and LPC synthesis filters 208, 210 and the resulting synfhesised speech ss(n) is 
finally compared 218 with the original speech signal s(n) in order to determine the 
difference, error e(n). Weighting filter 212 that is based on the characteristics of human 

3 5 hearing is used to wei^t error signal e(n) in order to attenuate frequencies at which the 
error is less important according to the auditory perception, and to correqjondingly 
amplify frequencies that matter more. For example, errors in the areas of "formant 
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valleys" may be emphasized as the errors in the synthesized speech are not so audible 
in the formant frequencies due to the auditory masking efifect. Codebook search 
controller 214 is used to define index u of the code vector in codebook 206 according 
to the weighted error term acquired from weighting filter 212. Consequently, index u 
5 indicating a certain excitation vector leading to a minimum possible weighted error is 
eventually selected. Controller 214 provides also scaling factor g that is multiplied 216- - 
with the code vector under analysis before LTP and LPC synthesis filtering. After a 
fiiame has been analysed, parameters describing the frame (a(i), LTP parameters hke T 
and optionally also gain g2, codebook vector index u or other identifier thereof, 
10 codebook scaling factor g) are sent over transmission channel (air interface, fixed 
transfer medixrai etc) to the speech decoder at the receiving end. 

Referring to figure 3, excitation codebook 306 corresponds to the one in the encoder 
used for generating excitation signal c(n) on the basis of received codebook index u. 
15 Excitation signal c(n) is then multiplied 312 with scaling factor g and directed to LTP 
synthesis filter supplied with necessary parameters T and g2. Finally the effect of the 
vocal tract is added to the synthesized speech signal by LPC synthesis filtering 310 
providing decoded speech signal ss(n) as an output. 

9 

20 Considering next fixed codebook vector selection in an ACELP type speech encoder, 
the pulse positions are determined by minimizing the error between the actual weighted 
input speech and a synthesized version thereof: 



where is perceptually weighted input speech, H is an LP model impulse response 

matrix utilizing calculated LP parameters, c is the selected codebook vector and v is a 
so-called "adaptive codebook" vector explained later in the text. The minimization of 
the=above=error is hi practise performed by maximizing the term: 



where 's^Sp-g^Hv is hereinafter called a "target signal" being equivalent to the 

-perceptually weighted-iirput speecfa sigiia l f xo m which t lie cuiiUibulio n of the adaptive 
codebook has been removed, k is the indexjQ£fixedxQdebook_yectorjcamder^nalysis,- 



(1) 



25 




(2) 
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The concept of the adaptive codebook is illustrated in figure 4 disclosing the CELP 
synthesis model in an alternative manner being quite similar to the common human 
speech production model of figure 1. However, the main difference lies in the 
excitation signal generation part: as seen firom figure 4 in CELP coders the selection of 
5 voicedAmvoiced excitation is not usually made at all and the excitation includes 
adaptive codebook part 402 and fixed codebook part 404 corresponding to excitation 
signals v(n) and c(n) respectively, which are first individually weighted g2, g and then 
summed 408 together to form final excitation u(n) for LPC synthesis filter 410. Thus 
the periodicity of the LP residual presented in figures 2 and 3 with a separate LTP filter 
10 connected in series with the LPC synthesis filter can be alternatively depicted as a 
feedback loop and adaptive codebook 402 comprising a delay element controlled by 
lag value T, 

To concretise the goal of the algebraic fixed codebook search that is performed after 
LPC and LTP analysis stages, an imaginary target signal of a single frame that should 
be modeled with an algebraic codebook to a maximum extent is presented in figure 5. 
Now if two pulses are to be allocated per firame (bold arrows), an optimum position for 
them is nearby peaks 502, 504 in order to minimize the energy left in the remaining 
error signal. In this particular example, exactly two pulses with adjustable sign can be 
included in the firame. Li a typical encoder, the number of codebook pulses per fi-ame 
and ampUtudes thereof is predefined although the overall amplitude of codebook 
vector c(n) can be altered via gain factor g. In addition to mere firames the original 
signal may be divided into a nxmiber of sub-firames (e.g. 1-4) as well, which are then 
separately parameterised in relation to all or some of the required parameters. For 
example, LPC analysis that results LPC coefficients may be executed only once per 
fi-ame thus a single set of LP parameters covers the whole frame whereas codebook 
vectors (fixed algrebraic and/or adaptive) can be analysed for each sub-frame. 

Gain factor g can be calculated by 
30 

^ clH^Hc,^ 

Although contemporary methods for modeling and regenerating an applicable 
excitation signal for LP synthesis filter seem to provide somewhat adequate results in 
35 many cases, a number of problems still exist therein.^It!s-obvious-that-depending- on 
the original input signal the prediction error may or may not have serious peaks left in 



wo 2005/034090 



7 



PCT/FI2004/000579 



the time domain presentation. The scenario can vary, and thus the fixed number of 
corrective pulses per £rame may sometimes be enough to rise the modeling accuracy 
into a moderate level but sometimes not. Occasionally, as with some of the existing 
speech coders, the modeling result may actually get worse by adding uimecessary 
5 pulses into the excitation signal when the codec specifications do not allow to alter.the - 
nuniber of pulses in a single fi'ame. On the other hand, if the number of pulses in a 
fi-ame and thus the total output bitrate is varied, the modeling process is surely more 
flexible but also more complex what comes to reception of variable length frames etc. 
Variable output bit-rate may also complicate network planning as transmission 
10 resources required by a single connection for transferring speech parameters are not 
fixed anymore. 



Figure 8A discloses a target signal in a scenario wherein a frame has been divided into 
four sub-frames. LPC analysis is performed once per frame, and LTP and fixed 
15 codebook analysis on a sub-frame basis. The target signal comprises severe 
fluctuations 802, 804, 806, 808 in sub-frame 3. However, as algebraic code vectors 
contain only two pulses sharp, they may be placed to cover peaks 802 and 804, but 
peaks 806 and 808 are left intact thus reducing the modeling result. 

20 Another defect in prior art coders relates to so called closed-loop search of the adaptive 
codebook vector relating to the LTP analysis. 



Usually an open-loop analysis is executed first in order to find a rough estimate of the 
lag T and gain g2 concerning e.g. a whole frame at a time. During open-loop search a 
25 weighted speech signal is just correlated with delayed versions of itself one at a time in 
order to locate correlation maximas. Considering found occurrences of these 
autocorrelation maximas, the corresponding delay values, in principle especially the 
one producing the highest maximum, then moderately predict the lag term T as the 
correlation maximum often results from the speech signal periodicity. 



30 



3D 



Thereafter, in a more accurate closed-loop adaptive codebook search LTP filter lag T 
and gain g2 values are determined by ndnimizing the weighted error between flie 
original and synthesized speech as in the algrebraic fixed codebook search. This is 
achieved e.g. in the AMR codec on sub-firame basis by maximizing the term: 

.s^(n)v^(n) 

R(Jc)-^ Pr ' ' (4) 
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where L is sub-frame length (e.g. 40 samples) -1, y(n)=v(n)*h(n) and yk is thus the 
past LP synthesis filtered excitation (adaptive codebook vector) at delay k. More 
details about open/closed loop searches especially in the case of AMR codec can be 
5 found in reference [3]. However, as it's clear that the actual excitation for the span of 
the current frame is still unknown upon maximising the above term, the current LP 
residual is used as substitute in scenarios with short delay values. See figure 9 A for 
clarification. If delay k is short enough, i.e. signal yk requires samples from the current 
sub-frame, any excitation for the current sub-frame is not yet available as the algebraic 

10 search is still to be conducted. Therefore, a straightforward solution is to use aheady 
available LP residual (may be initially calculated even to the whole frame) as a 
substitute for the missing part of the excitation vector corresponding to a time period 
between legends 902 and 904. On the other hand, a buffer for previous excitation can 
usually be made large enough, three dots emphasize this in the figure, in order to avoid 

1 5 situations where delay k is correspondingly too long, and the required excitation is not 
available in the buffer anymore. 

SUMMARY OF THE INVENTION 

20 The object of the present invention is to improve the excitation signal modeling and 
alleviate the existing defects in contemporary source coding, e.g. speech coding, 
methods. The object is achieved by introducing the concept of time advanced 
excitation generation. The excitation signal generated by, for example, fixed excitation 
codebook is determined in advance to partly cover the next frame or sub-frame as well 

25 in addition to the current frame. Hence the codebook is "time advanced" e.g. half of 
the (sub-)frame lengtii forward. This is achieved without increasing the overall coding 
delay whenever a frame look-ahead is in any case applied in the coding procedure. 
Look-ahead is an additional buffer that already exists in many state of the art speech 
coders and includes samples from the following frame. The reason why look-ahead 

30 buffer is originally included in the encoders is based on the LP modeling: during the 
LPC analysis of the current frame it has been found advantageous to take the 
forthcoming frame into accoimt as well in order to guarantee smooth enough transition 
between the adjacent frames. 

3 5 The aforesaid procedure offers a clear advantage over the prior art especially when the 
LP residual has occasional peaks embedded. This results from the fact that actually the 
number of pulses in a (sub-)frame may be doubled by advancing pulses from a certain 
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frame to the adjacent next frame. Thus tiie invention entails benefits of the variable- 
rate source coding on frame-by-frame basis but the true bit rate of the encoded signal 
at tiie output is fixed, and the overall system complexity remains at a relatively low 
level compared to solutions with traditional variable-rate coders. The core invention is 
5 still applicable both to fixed-rate and variable-rate coders. 

Respectively, as the true time advanced excitation can be used instead of LP residual 
during the closed loop search of the adaptive codebook parameters, the error signal 
modeling result is improved. 

10 

According to the invention, a source coding method enabling at least partial 
subsequent reconstruction of source data with a synthesis filter and an excitation signal 
thereof has the steps of 

-dividing the source data signal into consecutive blocks, 
15 -extracting a first set of parameters related to said filter describing properties of a 

first block covering a first time period, and 

-extracting a second set of parameters related to said excitation signal for said 
filter, where said second set of parameters is determined from and describing 
properties of both the first block and a second block following the first block 
20 within a second time period starting later than said first time period and extending 

outside said first time period. 

In another aspect of the invention, a method for decoding encoded data signal divided 
into consecutive blocks has the steps of 
25 -obtaining a first set of parameters for constructing a synthesis filter, said first set 

of parameters describing properties of a first block covering a first time period, 
-obtaining a second set of parameters for constructing an excitation signal for 
—Said synthesis fiilter, said second set of parameters describing properties of both 

30 peri^od-starting later than said first time period and extending outside said first 

time period, 

-obtaining at least part of a previous second set of parameters for constructing an 
excitation signal for said synthesis filter, said previous second set of parameters 
describing properties of said first block during at least the time period between 
35 the-begiimiug of said-iBist time period and the beginning of said second time 

period. 
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-combining the contribution of said previous second set of parameters and said 
second set of parameters for said excitation signal within the first time period, 
-constructing an excitation signal of said first block for said synthesis filter by 
utilizing said combination, and ^ 
5 "filtering said constructed excitation signal through said synthesis fQter. - 

In a further aspect of the invention, an electronic device for encoding source data 
divided into consecutive blocks to be represented by at least a first and a second set of 
parameters, comprises processing means and memory means for processing and storing 

10 instructions and data, and data transfer means for accessing data, and the device is 
arranged to determine said second set of parameters describing properties of both a 
first block covering a first time period, properties of said first block described by said 
first set of parameters, and a second block following the first block within a second 
time period starting later than said first time period and extending outside said first 

15 time period. 

In a further aspect of the invention, an electronic device for decoding source data 
divided into consecutive blocks, comprises processing means and memory means for 
processing and storing instructions and data, and data transfer means for accessing 
data, and the device is arranged to obtain 

20 a first set of parameters for constructing a synthesis filter, said first set of parameters 
describing properties of a first block covering a first time period, 

a second set of parameters for constructing an excitation signal for said synthesis filter, 
said second set of parameters describing properties of both the first block and a second 
block following the first block within a second time period starting later than said first 
25 time period and extending outside said first time period, 

at least part of a previous second set of parameters for constructing an excitation signal 
for said synthesis filter, said previous second set of parameters describing properties of 
said first block during at least the time period between the beginning of said first time 
period and the beginning of said second time period, 

30 said device further arranged to combine the contribution of said previous second set of 
parameters and said second set of parameters, for said excitation signal within said first 
time period, 

to construct an excitation signal of said first block for said synthesis filter by utilizing 

said-combination^ and 



wo 2005/034090 



PCT/FI2004/000579 



11 

to filter said constructed excitation signal through said synthesis filter. 

In a fiirther aspect of the invention, a computer program for encoding soiirce data 
divided into consecutive blocks to be represented by at least a first and a second set of 
parameters, comprises code meaos to determine said second set of parameters 
5 descxibing properties of both a first block covering a fitrst time period, properties of 
said first block described by said first set of parameters, and a second block following 
the first block within a second time period startiug later than said first time period and 
extending outside said first time period. 

Still in a fiirther aspect of the invention, a computer program for decoding source data 
10 represented by at least a first and a second set of parameters, where said first set of 
parameters relate to a synthesis filter and said second set of parameters to an excitation 
signal for said filter, said data divided into consecutive blocks, said first set of 
parameters describing properties of a first block covering a first time period and said 
second set of parameters describing properties of both the first block and a second 
15 block: following the first block within a second time period starting later than said first 
time period and extending outside said first time period, comprises code means, 

by utilizing at least part of a previous second set of parameters for constructing an 
excitation signal for said synthesis filter, said previous second set of parameters 
describing properties of said first block during at least the time period between the 
20 beginning of said first time period and the beginning of said second time period, 

to combine the contribution of said previous second set of parameters and said second 
set of parameters for said excitation signal within said first time period, 

to coxistruct an excitation signal of said first block for said synthesis filter by utilizing 
said combination, and 

25 to filter said constructed excitation signal through said synthesis filter. 

The term "set" refers generally to a collection of one or more elements, e.g. 



^-au— embodiment of the invention, tihe proposed method for excitation generation is 
utilized in a CELP type speech coder, A speech frame is divided into sub-frames that 
3Q ai& analysed first as a whole, then one at a time. In order to determine an advanced 
exGitation-signal^the target-signal and Ihe-Sxed codebook are shifted- for exam ple half 
a sub-firame-forward during the-ana lysis stage ; 



-mm- 
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Accompanying dependent claims disclose embodiments of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

5 Hereinafter the invention is described in more detail by reference to the attached 
drawings, wherein 

Fig. 1 discloses a human speech production model. 
Fig. 2 illustrates a block diagram of a typical CELP speech encoder. 
10 Fig. 3 illustrates a block diagram of a typical CELP speech decoder. 
Fig. 4 depicts a CELP synthesis model for speech generation. 

Fig. 5 discloses a typical scenario in a CELP type speech encoding where the target 
signal is modeled with a fixed number of pulses included in a single code vector. 
Fig. 6 illustrates a block diagram of a CELP encoder according to the invention. 
15 Fig. 7 illustrates a block diagram of a CELP decoder according to the invention. 

Fig. 8A illustrates target signal modeling with fixed two pulses per sub-fi:ame in a 
conventional speech codec. 

Fig. 8B illustrates target signal modeling with a maximum of four pulses per sub- 

firame in accordance with liie invention. 
20 Fig. 9A illustrates a scenario wherein LP residual has to be used as a substitute for 

true excitation signal in a closed-loop LTP parameter search of conventional codecs. 

Fig. 9B illustrates a scmario wherein time advanced excitation is readily available for 

fiirfher use in a closed-loop LTP parameter search of the current invention. 

Fig. 10 discloses a flow diagram of the method of the invention for encoding a data 
25 signal. 

Fig. 1 1 discloses a flow diagram of the method of the invention for decoding an 
encoded data signal. 

Fig. 12 discloses a block diagram of a device according to the invention. 

30 

DETAILED DESCRIPTION OF THE EMBODIMENT OF THE INVENTION 

Figures 1-5, 8A, and 9A were already discussed in conjunction with the description of 
related prior art. 

35 

Figure 6 discloses, by way of"example only, a block diagram of a CELP encoder 
utilizing the proposed technique of time advancing the excitation signal. LPC analysis 
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is performed once per frame, and LTP analysis and excitation search for every sub- 
frame in a frame comprising four sub-frames. The codec also includes a look-ahead 
buffer for input speech. 

Encoding process of the invention comprises similar general steps as the prior- art 
methods. LPC analysis 604 provides LP parameters, and LPT analysis 602 results lag 
T and gain g2 terms. Optimal excitation search loop comprises codebook 606, 
multiplier 616, LTP/adaptive codebook and LPC synthesis filters 608, 610, adder 618, 
weighting filter 612 and search logic 614. In addition, memory 622 for storing the 
selected excitation vector or indication thereof for a certain sub-frame and combine 
logic 620 to join the last half of previously selected and stored excitation vector, which 
was calculated during analysis of previous sub-frame but targeted for the first half of 
the current sub-frame, and the first part of the currently selected excitation vector for 
gain determination as described later are included. 

The first difference between prior art solutions and the one of the invention occurs in 
connection with the calculation of the target signal for the excitation codebook search. 
If the excitation codebook is shifted for example half of a sub-fi^e ahead, the latter 
half of the codebook resides in the next sub-frame. Considering the last sub-fi^me in a 
frame, the look-ahead buffer may be correspondingly exploited. In addition, the 
amount of shifting can be varied on the basis of a separate (e.g. manually controlled) 
shift control parameter or of the characteristics of the input data, for example. The 
parameter may be received from an external entity, e.g. from a network entity such as a 
radio network controller in the case of a mobile terminal. Input data may be 
statistically analysed and, if seen necessary (e.g. occasional peak formations found in 
the target signal), the shifting can be dynamically introduced to the coding process or 
the existing shifting may be altered. Then the selected shift parameter value can be 
transnxitted to the receiving end (to be used by the decoder) either separately or as 
embedded in the speech frames or signalling. The transmission may occur e.g. once per 
framewor-upon change in the parameter value. 

In figure 8B, a portion of a target signal (effectively a speech signal from which the 
effect-of^daptive codebook is removed as described hCTeinbefore) cfivided into a fi^me 
of four sub-frames and a look-ahead buffer are disclosed. The optimal excitation code 
- veetor-is-determ iiied by irriTiiTniTripg the error 



(5) 
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where S],^ is the new advanced target signal comprising latter half of the current sub- 
frame's target and first half of the following sub-frame's target. The division is visible 
in figure 8B; target (sub-)frame windows are shifted 810 half a sub-frame ahead in 
5 time in relation to the corresponding sub-frames. In this example, the look-ahead 
buffer equals to half a size of a sub-frame thus limiting (or in other words, enabling) 
the possible time shift between target and actual sub-frames to the same amount, i.e. 
time shift occurs between 0 and L/2, where L is the length of a sub-frame. As a 
generalization, shift shall be defined as equal or less to the length of the look-ahead 
10 buffer if a proper target signal should always be calculable from the input signal truly 
existing in the buffer. Note that memory 622 is not utilized in calculating the excitation 
vector. 



Optionally, if also impulse response matrix H has been calculated on sub-frame basis, a 
15 time shift equivalent to one of the target signal may be introduced to it for minimizing 
the error defined by equation 5. Correspondingly, if none of the speech parameters is 
actually modeled on a sub-frame basis and only frames are analysed as such, it makes 
no substantial difference to the applicability of the invention. 

20 Referring to equation 2, the pulse positions for an advanced excitation vector are 
calculated respectively also in this case but with time advanced target and optionally 
with similarly advanced impulse response matrix. Possible advancing of gain factor 
is more or less mere academic issue, as the gain factor is not needed in this 
solution model for determining the optimal excitation. 

25 

Meanwhile, codebook gain g for the excitation vector is calculated on the basis of the 
actual sub-frame as follows 

30 

where is a joint excitation vector 

Oc = k^jf (7) 

35 consisting of subvectors^c, =c,._j.(.fc),_ A: = £./ 2+^^^^^ l~l...L v/here 

corresponds to the excitation vector calcxilated in the i:th sub-frame and L is the length 
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of the sub-frame and the excitation vector. Contents of memory 622 are this time 
needed in the procedure in order to provide latter half of previous sub-frame to the 
joint vector. 

5 As the excitation vectors are just shifted dxiring analysis and synthesis stages in 
encoder/decoder, their internal structure remains intact; the coding of pulse locations 
can be kept original and the structure of parameterised frames transferred over the 
transmission channel is not changed. Thus also data handling like different parameter 
insertion/extraction routines needed in the encoder/decoder do not require 
10 modifications in a traditional coder to be converted into conformity with the proposed 
solution. 

And what comes to the LTP analysis and an adaptive codebook closed-loop search 
thereof in the advanced excitation CELP codec, the situation is depicted in figure 9B. 

1 5 Differing from the prior art solutions, past excitation available extends to a point 9 1 0 at 
the border of the time advanced target signal for the last-sub-frame of the previous 
frame and the first time advanced target signal of the current frame. Hence, the LTP 
analysis is improved as the true excitation can be at least partly utilized instead of mere 
LP residual during the closed-loop search. The same analogy applies to the following 

20 sub-frames or a scenario wherein sub-firames are not used at all and modeling takes 
place in frame units only. 

A block diagram of the decoder of the invention is disclosed in figure 7. The decoder 
receives the excitation codebook index u, excitation gain g, LTP coefficients T, g2 (if 

25 present), and LP parameters a(i). First the decoder resolves the excitation vector from 
codebook 706 by utilizing index u and combines the retrieved vector with the previous 
sub-frame vector (memory) 716 as explainer earlier. The latter half of previous vector 
is attached to the first half of the current vector in block 714 after which the original 
current vector Qr_.at least the latter half thereof (or indication thereof) is stored in 

30 ~-.ineiiior5^716-fGr future use. The created joint vector is then multiplied 712 by gain g, 

a^-Stered fluough LTP synthesis 708 and LPC synthesis 710 filters in order to 

produce a synthesized speech signal ss(n) in the output 

A flow diagram of the encoding method is disclosed in figure 10. Respectively, the 
35 " Jwodiffgiauw-diagram-is depicted in figure 117 The flow diagrams are constructed to 
frufher-facilitate-the- miders lauding of encoder internals although the gg m e b asic 
principles can already be found in the block diagrams of figures 6 and 7. Step 1002 



wo 2005/034090 



PCT/FI2004/000579 



16 

corresponds to method start-up where e.g. filter memories and parameters are 
initialised. In step 1004 the source signal is, if not already, divided into blocks to be 
parameterized. Blocks may, for example, be equivalent to frames or sub-firames of the 
aforepresented embodiment. Although the flow diagrams in figures 10 and 11 handle 
S the source data on a single level of block hierarchy, the solutions corresponding to the 
actual embodiment where source data was first divided into top-level blocks like 
firames and tbien to the sub-blocks (such as sub-fi:ames) thereof are possible. Part of the 
overall analysis may be thus executed on higher level and rest on lower level, like 
fi-ame level LPC analysis and sub-frame level excitation vector analysis in the 

10 disclosed enobodiment. Therefore, it's not crucial to the invention what type of 
hierarchy is uised, or on what levels certain parameters are analysed as long as the 
excitation signal analysis exploits time advancing in relation to the actual block 
division of that level. In step 1006 a new block is selected for encoding and LPC 
analysis is performed resulting a set of LP parameters. Such parameters can be 

15 transferred to the recipient as such or in a coded form (as line spectral pairs, for 
example), a table index or utilizing whatever suitable indication. The following step 
includes LTP analysis 1008 outputting open-loop LTP parameters for the closed-loop 
LTP/adaptive codebook parameter search. As described hereinbefore, a time advanced 
target signal for excitation search is defined in step 1010. In analysis-by-synthesis type 

20 excitation search loop an excitation vector is selected 1012 from the excitation 
codebook and used in synthesizing the speech 1014. Procedure is repeated until the 
maximum count for a number of iteration rounds is reached or the predefined error- 
criteria is met 1016. The excitation vector producing the smallest error is normally the 
one to be selected. The selected vector (or other indication thereof such as a codebook 

25 index) or at least the part thereof corresponding to the next block, is also stored for 
further use. The excitation gain is calculated in step 1018. The overall encoding 
process is continued from step 1006 if any unprocessed blocks left 1020, otherwise the 
method is ended in phase 1022. 

30 In step 1102 the decoding process is ramped up with necessary initialisations etc. 
Encoded data is received 1104 in blocks that are, for example, buffered for later 
decoding. The current excitation vector for the block under reconstruction is 
determined by utilizing the received data in step 1106, which may mean, for example, 
retrieving a certain code vector from a codebook on the basis of received codebook 

35 index. In step 1 108 the previous excitation vector (or in practise the required part, e.g. 
last half, thereof) or indication thereof is retrieved from the memory and attached to 
the relevant first part of the current vector in phase 1110. Then the current vector (or 
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the more relevant latter part of it) is stored 1112 in the memory (as an index, true 
vector or other possible derivative/indication) to be used in connection with the 
decoding of the next block. The joint vector is multiplied by excitation gain in phase 
1114 and finally filtered through LTP synthesis 1116 and LPC synthesis 1118 filters. 
5 LTP and LP parameters may have been received as such or as coded (indications like 
table index, or in a line spectral pair form etc). If there are no blocks left to be decoded 
1 120, the method execution is redirected to step 1 106. Otherwise the method is ended 
1 122. In many cases, step ordering presented in the diagrams may not be an essential 
issue; for example, the execution order of phases 1106 and 1108, and 1110 and 112 
1 0 can be reversed if needed purposeful. 

Figure 12 depicts one option for basic components of a device like a communications 
device (e.g. a mobile terminal), a data storage device, an audio recorder/playback 
device, a network element (e.g. a base station, a gateway, an exchange or a module 

15 thereof), or a computer capable of processing, storing, and accessing data in 
accordance with the invention. Memory 1204, divided between one or more physical 
chips, comprises necessary code 1216, e.g. in a form of a computer 
program/application, and data 1212; a necessary input for the proposed niethod 
producing an encoded (or respectively decoded) version 1214 as an output. A 

20 processing imit 1202, e.g. microprocessor, a DSP (digital signal processor), a 
microcontroller, or a programmable logic, is required for the actual execution of the 
method including the encoding and/or decoding of data 1212 in accordance with 
instructions 1216 stored in memory 1204. Display 1206 and keypad 1210 are in 
principle optional components but still often needed for providing necessary device 

25 control and data visualization means (-user interface) to the user. Data transfer means 
1208, e.g. a CD/floppy/hard drive or a network adapter, are required for handling data 
exchange, for example acquiring source data and outputting processed data, with other 
devices: Data transfer means 1208 may also indicate audio parts like transducers (A/D 
and D/A converters, microphone, loudspeaker, amplifiers etc) that are used to input the 

30 .audio signal for processing and/or output the decoded signal. This scenario is 
s^^^saMey for example, in the case of mobile terminals and various audio storage 
-andZor playback devices such as audio recorders and dictating machines utilizing the 
-method-of tiie-invention. The-code 121-6-fQrthe-exec u t ion of Hie p ro posed meth od can 
be stored and delivered on a carrier medium like a floppy, a CD or a memory card, 

ZZ -Fia&ermore^a-device-perfbrmin encoding and/or decud ing~according to the 

inventien-may be implemented as a module-(e7g, a codec chip or ctrcuit arrangement) 
included in or jtist connected to some other device. Then the module does not have to 
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contain all th.e necessary code means for completing the overall task of encoding or 
decoding. The module may, for example, receive at least some of the filter parameters 
like LP or LPT parameters firom an external entity in addition to the unencoded or 
encoded data and determine/construct just the excitation signal by itself. 

5 

The scope of the invention can be found in the following claims. However, utilized - ^ 
devices, mettiod steps, data structures etc may vary significantly depending on the 
current scenario, still converging to the basic ideas of this invention. For example, it is 
clear that the size reduction aspect of source data is not a necessary, definitely a typical 

10 though, condition for utilizing the proposed method; it can be used just for 
representing and analysing the source data with a number of parameters. In addition to 
data transfer solutions the invention may be applied in a single device only for data 
storage purposes. Furthermore, any kind of source data can be used in the method, not 
just speech. However, with data carrying speech characteristics, i.e. data for which the 

15 source-filter approach fits well, the modeling results are presumably most accurate. 
Still further, the invention may be used in any kind of device capable of executing the 
necessary processing steps; the applicable device and component types are thus not 
strictly limited to the ones listed hereinbefore. 

20 
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